75 resultados para Big data

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the arrival of Big Data Era, properly utilizing the power of big data is becoming increasingly essential for the strength and competitiveness of businesses and organizations. We are facing grand challenges from big data from different perspectives, such as processing, communication, security, and privacy. In this talk, we discuss the big data challenges in network traffic classification and our solutions to the challenges. The significance of the research lies in the fact that each year the network traffic increase exponentially on the current Internet. Traffic classification has wide applications in network management, from security monitoring to quality of service measurements. Recent research tends to apply machine-learning techniques to flow statistical feature based classification methods. In this talk, we propose a series of novel approaches for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. We analyze the new classification approaches and their performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world traffic datasets to validate the proposed approach. The results show the traffic classification performance can be improved significantly even under the extreme difficult circumstance of very few training samples. Our work has significant impact on security applications.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for big data. These classifiers are very large, but are quite easy to generate and use. They can be so large that it makes sense to use them only for big data. They are generated automatically as a result of several iterations in applying ensemble meta classifiers. They incorporate diverse ensemble meta classifiers into several tiers simultaneously and combine them into one automatically generated iterative system so that many ensemble meta classifiers function as integral parts of other ensemble meta classifiers at higher tiers. In this paper, we carry out a comprehensive investigation of the performance of LIME classifiers for a problem concerning security of big data. Our experiments compare LIME classifiers with various base classifiers and standard ordinary ensemble meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of classifications. LIME classifiers performed better than the base classifiers and standard ensemble meta classifiers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data presents a remarkable opportunity for organisations to obtain critical intelligence to drive decisions and obtain insights as never before. However, big data generates high network traffic. Moreover, the continuous growth in the variety of network traffic due to big data variety has rendered the network to be one of the key big data challenges. In this article, we present a comprehensive analysis of big data variety and its adverse effects on the network performance. We present taxonomy of big data variety and discuss various dimensions of the big data variety features. We also discuss how the features influence the interconnection network requirements. Finally, we discuss some of the challenges each big data variety dimension presents and possible approach to address them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stochastic search techniques such as evolutionary algorithms (EA) are known to be better explorer of search space as compared to conventional techniques including deterministic methods. However, in the era of big data like most other search methods and learning algorithms, suitability of evolutionary algorithms is naturally questioned. Big data pose new computational challenges including very high dimensionality and sparseness of data. Evolutionary algorithms' superior exploration skills should make them promising candidates for handling optimization problems involving big data. High dimensional problems introduce added complexity to the search space. However, EAs need to be enhanced to ensure that majority of the potential winner solutions gets the chance to survive and mature. In this paper we present an evolutionary algorithm with enhanced ability to deal with the problems of high dimensionality and sparseness of data. In addition to an informed exploration of the solution space, this technique balances exploration and exploitation using a hierarchical multi-population approach. The proposed model uses informed genetic operators to introduce diversity by expanding the scope of search process at the expense of redundant less promising members of the population. Next phase of the algorithm attempts to deal with the problem of high dimensionality by ensuring broader and more exhaustive search and preventing premature death of potential solutions. To achieve this, in addition to the above exploration controlling mechanism, a multi-tier hierarchical architecture is employed, where, in separate layers, the less fit isolated individuals evolve in dynamic sub-populations that coexist alongside the original or main population. Evaluation of the proposed technique on well known benchmark problems ascertains its superior performance. The algorithm has also been successfully applied to a real world problem of financial portfolio management. Although the proposed method cannot be considered big data-ready, it is certainly a move in the right direction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, the Big Data paradigm has received considerable attention since it gives a great opportunity to mine knowledge from massive amounts of data. However, the new mined knowledge will be useless if data is fake, or sometimes the massive amounts of data cannot be collected due to the worry on the abuse of data. This situation asks for new security solutions. On the other hand, the biggest feature of Big Data is "massive", which requires that any security solution for Big Data should be "efficient". In this paper, we propose a new identity-based generalized signcryption scheme to solve the above problems. In particular, it has the following two properties to fit the efficiency requirement. (1) It can work as an encryption scheme, a signature scheme or a signcryption scheme as per need. (2) It does not have the heavy burden on the complicated certificate management as the traditional cryptographic schemes. Furthermore, our proposed scheme can be proven-secure in the standard model. © 2014 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Smart grid is a technological innovation that improves efficiency, reliability, economics, and sustainability of electricity services. It plays a crucial role in modern energy infrastructure. The main challenges of smart grids, however, are how to manage different types of front-end intelligent devices such as power assets and smart meters efficiently; and how to process a huge amount of data received from these devices. Cloud computing, a technology that provides computational resources on demands, is a good candidate to address these challenges since it has several good properties such as energy saving, cost saving, agility, scalability, and flexibility. In this paper, we propose a secure cloud computing based framework for big data information management in smart grids, which we call 'Smart-Frame.' The main idea of our framework is to build a hierarchical structure of cloud computing centers to provide different types of computing services for information management and big data analysis. In addition to this structural framework, we present a security solution based on identity-based encryption, signature and proxy re-encryption to address critical security issues of the proposed framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a leading framework for processing and analyzing big data, MapReduce is leveraged by many enterprises to parallelize their data processing on distributed computing systems. Unfortunately, the all-to-all data forwarding from map tasks to reduce tasks in the traditional MapReduce framework would generate a large amount of network traffic. The fact that the intermediate data generated by map tasks can be combined with significant traffic reduction in many applications motivates us to propose a data aggregation scheme for MapReduce jobs in cloud. Specifically, we design an aggregation architecture under the existing MapReduce framework with the objective of minimizing the data traffic during the shuffle phase, in which aggregators can reside anywhere in the cloud. Some experimental results also show that our proposal outperforms existing work by reducing the network traffic significantly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data is the large or complex data that exceed the processing capacity of conventional data processing systems. This book provides a big picture in this broad research area, covering all the phases of its value chains. The authors have attempted to survey most of the relevant technologies in each phrase of big data. The book is recommended for readers interested in advanced research in big data, also for industry practitioners who are interested in building big data applications. If the reader is not with necessary technical background, complementary readings may be needed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data is an emerging hot research topic due to its pervasive application in human society, such as government, climate, finance, and science. Currently, most research work on big data falls in data mining, machine learning, and data analysis. However, these amazing top-level killer applications would not be possible without the underneath support of networking due to their extremely large volume and computing complexity, especially when real-time or near-real-time applications are demanded.